18 research outputs found
Differentiable Algorithm Networks for Composable Robot Learning
This paper introduces the Differentiable Algorithm Network (DAN), a
composable architecture for robot learning systems. A DAN is composed of neural
network modules, each encoding a differentiable robot algorithm and an
associated model; and it is trained end-to-end from data. DAN combines the
strengths of model-driven modular system design and data-driven end-to-end
learning. The algorithms and models act as structural assumptions to reduce the
data requirements for learning; end-to-end learning allows the modules to adapt
to one another and compensate for imperfect models and algorithms, in order to
achieve the best overall system performance. We illustrate the DAN methodology
through a case study on a simulated robot system, which learns to navigate in
complex 3-D environments with only local visual observations and an image of a
partially correct 2-D floor map.Comment: RSS 2019 camera ready. Video is available at
https://youtu.be/4jcYlTSJF4
DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles
Autonomous vehicle (AV) stacks are typically built in a modular fashion, with
explicit components performing detection, tracking, prediction, planning,
control, etc. While modularity improves reusability, interpretability, and
generalizability, it also suffers from compounding errors, information
bottlenecks, and integration challenges. To overcome these challenges, a
prominent approach is to convert the AV stack into an end-to-end neural network
and train it with data. While such approaches have achieved impressive results,
they typically lack interpretability and reusability, and they eschew
principled analytical components, such as planning and control, in favor of
deep neural networks. To enable the joint optimization of AV stacks while
retaining modularity, we present DiffStack, a differentiable and modular stack
for prediction, planning, and control. Crucially, our model-based planning and
control algorithms leverage recent advancements in differentiable optimization
to produce gradients, enabling optimization of upstream components, such as
prediction, via backpropagation through planning and control. Our results on
the nuScenes dataset indicate that end-to-end training with DiffStack yields
substantial improvements in open-loop and closed-loop planning metrics by,
e.g., learning to make fewer prediction errors that would affect planning.
Beyond these immediate benefits, DiffStack opens up new opportunities for fully
data-driven yet modular and interpretable AV architectures. Project website:
https://sites.google.com/view/diffstackComment: CoRL 2022 camera read
Foundation Models for Semantic Novelty in Reinforcement Learning
Effectively exploring the environment is a key challenge in reinforcement
learning (RL). We address this challenge by defining a novel intrinsic reward
based on a foundation model, such as contrastive language image pretraining
(CLIP), which can encode a wealth of domain-independent semantic
visual-language knowledge about the world. Specifically, our intrinsic reward
is defined based on pre-trained CLIP embeddings without any fine-tuning or
learning on the target RL task. We demonstrate that CLIP-based intrinsic
rewards can drive exploration towards semantically meaningful states and
outperform state-of-the-art methods in challenging sparse-reward
procedurally-generated environments.Comment: Foundation Models for Decision Making Workshop at Neural Information
Processing Systems, 202
Receding Horizon Planning with Rule Hierarchies for Autonomous Vehicles
Autonomous vehicles must often contend with conflicting planning
requirements, e.g., safety and comfort could be at odds with each other if
avoiding a collision calls for slamming the brakes. To resolve such conflicts,
assigning importance ranking to rules (i.e., imposing a rule hierarchy) has
been proposed, which, in turn, induces rankings on trajectories based on the
importance of the rules they satisfy. On one hand, imposing rule hierarchies
can enhance interpretability, but introduce combinatorial complexity to
planning; while on the other hand, differentiable reward structures can be
leveraged by modern gradient-based optimization tools, but are less
interpretable and unintuitive to tune. In this paper, we present an approach to
equivalently express rule hierarchies as differentiable reward structures
amenable to modern gradient-based optimizers, thereby, achieving the best of
both worlds. We achieve this by formulating rank-preserving reward functions
that are monotonic in the rank of the trajectories induced by the rule
hierarchy; i.e., higher ranked trajectories receive higher reward. Equipped
with a rule hierarchy and its corresponding rank-preserving reward function, we
develop a two-stage planner that can efficiently resolve conflicting planning
requirements. We demonstrate that our approach can generate motion plans in
~7-10 Hz for various challenging road navigation and intersection negotiation
scenarios